智能论文笔记

UniLayout: Taming Unified Sequence-to-Sequence Transformers for Graphic Layout Generation

Zhaoyun Jiang , Huayu Deng , Zhongkai Wu , Jiaqi Guo , Shizhao Sun , Vuksan Mijovic , Zijiang Yang , Jian-Guang Lou , Dongmei Zhang

分类：计算机视觉

2022-08-17

为了满足各种用户需求，近年来对图形布局的不同子任务进行了深入探讨。现有研究通常提出具有不同投入输出格式，专用模型体系结构和不同学习方法的任务特异性方法。但是，这些专业的方法使得适应了看不见的子任务，阻碍了不同子任务之间的知识共享，并且与设计通用模型的趋势背道而驰。在这项工作中，我们提出了Unilayout，该Unilayout以统一的方式处理图形布局生成的不同子任务。首先，我们统一地表示子任务的各种输入和输出作为令牌序列。然后，基于统一的序列格式，我们自然利用具有不同子任务的变压器的相同的编码器架构。此外，基于上述两种统一，我们进一步开发了一个同时支持所有子任务的单个模型。在两个公共数据集上的实验表明，尽管简单，单层虽然明显优于先前的特定于任务的方法。

translated by 谷歌翻译

Positional-Spectral-Temporal Attention in 3D Convolutional Neural Networks for EEG Emotion Recognition

Jiyao Liu , Yanxi Zhao , Hao Wu , Dongmei Jiang

分类：计算机视觉

2021-10-13

认识到人类的感情在日常沟通中发挥着关键作用。神经科学已经证明，不同的情绪状态存在于不同脑区，脑电图频带和颞戳中不同程度的激活。在本文中，我们提出了一种新颖的结构来探索情感认可的信息脑电图。所提出的模块，由PST-Integn表示，由位置，光谱和颞件注意力模块组成，用于探索更多辨别性EEG特征。具体地，位置注意模块是捕获在空间尺寸中的不同情绪刺激的激活区域。光谱和时间注意力模块分别分配不同频带和时间片的权重。我们的方法是自适应的，也可以符合其作为插入式模块的3D卷积神经网络（3D-CNN）。我们在两个现实世界数据集进行实验。 3D-CNN结合我们的模块实现了有希望的结果，并证明了PST-关注能够从脑电图中捕获稳定的情感识别模式。

translated by 谷歌翻译

Rethinking Mobile Block for Efficient Neural Models

Jiangning Zhang , Xiangtai Li , Jian Li , Liang Liu , Zhucun Xue , Boshen Zhang , Zhengkai Jiang , Tianxin Huang , Yabiao Wang , Chengjie Wang

分类：计算机视觉

2023-01-03

This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.

translated by 谷歌翻译

A Theory of Human-Like Few-Shot Learning

Zhiying Jiang , Rui Wang , Dongbo Bu , Ming Li

分类：机器学习

2023-01-03

We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.

translated by 谷歌翻译

Vocabulary-informed Zero-shot and Open-set Learning

Yanwei Fu , Xiaomei Wang , Hanze Dong , Yu-Gang Jiang , Meng Wang , Xiangyang Xue , Leonid Sigal

分类：计算机视觉 | 机器学习

2023-01-03

Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.

translated by 谷歌翻译

Faster Approximate Dynamic Programming by Freezing Slow States

Yijia Wang , Daniel R. Jiang

分类：人工智能 | 机器学习

2023-01-03

We consider infinite horizon Markov decision processes (MDPs) with fast-slow structure, meaning that certain parts of the state space move "fast" (and in a sense, are more influential) while other parts transition more "slowly." Such structure is common in real-world problems where sequential decisions need to be made at high frequencies, yet information that varies at a slower timescale also influences the optimal policy. Examples include: (1) service allocation for a multi-class queue with (slowly varying) stochastic costs, (2) a restless multi-armed bandit with an environmental state, and (3) energy demand response, where both day-ahead and real-time prices play a role in the firm's revenue. Models that fully capture these problems often result in MDPs with large state spaces and large effective time horizons (due to frequent decisions), rendering them computationally intractable. We propose an approximate dynamic programming algorithmic framework based on the idea of "freezing" the slow states, solving a set of simpler finite-horizon MDPs (the lower-level MDPs), and applying value iteration (VI) to an auxiliary MDP that transitions on a slower timescale (the upper-level MDP). We also extend the technique to a function approximation setting, where a feature-based linear architecture is used. On the theoretical side, we analyze the regret incurred by each variant of our frozen-state approach. Finally, we give empirical evidence that the frozen-state approach generates effective policies using just a fraction of the computational cost, while illustrating that simply omitting slow states from the decision modeling is often not a viable heuristic.

translated by 谷歌翻译

Muse: Text-To-Image Generation via Masked Generative Transformers

Huiwen Chang , Han Zhang , Jarred Barber , AJ Maschinot , Jose Lezama , Lu Jiang , Ming-Hsuan Yang , Kevin Murphy , William T. Freeman , Michael Rubinstein

分类：计算机视觉 | 人工智能 | 机器学习

2023-01-02

We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io

translated by 谷歌翻译

Large-Scale Traffic Signal Control by a Nash Deep Q-network Approach

Yuli. Zhang , Shangbo. Wang , Ruiyuan. Jiang

分类：人工智能 | 机器学习

2023-01-02

Reinforcement Learning (RL) is currently one of the most commonly used techniques for traffic signal control (TSC), which can adaptively adjusted traffic signal phase and duration according to real-time traffic data. However, a fully centralized RL approach is beset with difficulties in a multi-network scenario because of exponential growth in state-action space with increasing intersections. Multi-agent reinforcement learning (MARL) can overcome the high-dimension problem by employing the global control of each local RL agent, but it also brings new challenges, such as the failure of convergence caused by the non-stationary Markov Decision Process (MDP). In this paper, we introduce an off-policy nash deep Q-Network (OPNDQN) algorithm, which mitigates the weakness of both fully centralized and MARL approaches. The OPNDQN algorithm solves the problem that traditional algorithms cannot be used in large state-action space traffic models by utilizing a fictitious game approach at each iteration to find the nash equilibrium among neighboring intersections, from which no intersection has incentive to unilaterally deviate. One of main advantages of OPNDQN is to mitigate the non-stationarity of multi-agent Markov process because it considers the mutual influence among neighboring intersections by sharing their actions. On the other hand, for training a large traffic network, the convergence rate of OPNDQN is higher than that of existing MARL approaches because it does not incorporate all state information of each agent. We conduct an extensive experiments by using Simulation of Urban MObility simulator (SUMO), and show the dominant superiority of OPNDQN over several existing MARL approaches in terms of average queue length, episode training reward and average waiting time.

translated by 谷歌翻译

FedICT: Federated Multi-task Distillation for Multi-access Edge Computing

Zhiyuan Wu , Sheng Sun , Yuwei Wang , Min Liu , Xuefeng Jiang , Bo Gao

分类：机器学习

2023-01-01

The growing interest in intelligent services and privacy protection for mobile devices has given rise to the widespread application of federated learning in Multi-access Edge Computing (MEC). Diverse user behaviors call for personalized services with heterogeneous Machine Learning (ML) models on different devices. Federated Multi-task Learning (FMTL) is proposed to train related but personalized ML models for different devices, whereas previous works suffer from excessive communication overhead during training and neglect the model heterogeneity among devices in MEC. Introducing knowledge distillation into FMTL can simultaneously enable efficient communication and model heterogeneity among clients, whereas existing methods rely on a public dataset, which is impractical in reality. To tackle this dilemma, Federated MultI-task Distillation for Multi-access Edge CompuTing (FedICT) is proposed. FedICT direct local-global knowledge aloof during bi-directional distillation processes between clients and the server, aiming to enable multi-task clients while alleviating client drift derived from divergent optimization directions of client-side local models. Specifically, FedICT includes Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Adjustment (LKA). FPKD is proposed to reinforce the clients' fitting of local data by introducing prior knowledge of local data distributions. Moreover, LKA is proposed to correct the distillation loss of the server, making the transferred local knowledge better match the generalized representation. Experiments on three datasets show that FedICT significantly outperforms all compared benchmarks in various data heterogeneous and model architecture settings, achieving improved accuracy with less than 1.2% training communication overhead compared with FedAvg and no more than 75% training communication round compared with FedGKT.

translated by 谷歌翻译

A Multi-Source Information Learning Framework for Airbnb Price Prediction

Lu Jiang , Yuanhan Li , Na Luo , Jianan Wang , Qiao Ning

分类：机器学习

2023-01-01

With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.

translated by 谷歌翻译